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Abstract 

The accuracy of a diagnostic test is typically characterised using the receiver 
operating characteristic (ROC) curve. Summarising indexes such as the area 
under the ROC curve (AUG) are used to compare different tests as well as to 
measure the difference between two populations. Often additional information 
is available on some of the covariates which are known to influence the accuracy 
of such measures. We propose nonparametric methods for covariate adjustment 
of the AUG. Models with normal errors and non- normal errors are discussed 
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and analysed separately. Nonparametric regression is used for estimating mean 
and variance functions in both scenarios. In the general noise case we propose 
a covariate-adjusted Mann- Whitney estimator for AUG estimation which effec- 
tively uses available data to construct working samples at any covariate value 
of interest and is computationally efficient for implementation. This provides a 
generalisation of the Mann- Whitney approach for comparing two populations 
by taking covariate effects into account. We derive asymptotic properties for 
the AUG estimators in both settings, including asymptotic normality, optimal 
strong uniform convergence rates and MSE consistency. The usefulness of the 
proposed methods is demonstrated through simulated and real data examples. 

Keywords: Area Under Curve, Asymptotics, Covariate Adjustment, Mann- Whitney, 
Nonparametric, Smoothing, Uniform Convergence 

1 Introduction 

The receiver operating characteristic (ROC) curve is a commonly used tool for sum- 
marizing the accuracy of a test with binary results. The sensitivity, or true positive 
rate, of a binary test is the probabihty that a truly diseased subject is diagnosed as 
diseased. The specificity, which is also equal to one minus false positive rate, is de- 
fined as the probability that a healthy subject produces a negative test. Suppose that 
the result of a test is a random variable Y; depending on whether Y < cor Y > c the 
test result is considered negative or positive, respectively. If the distribution of Y is 
continuous, each value of the threshold c will correspond to different sensitivity and 
specificity values. In general the ROC curve summarizes how well two populations 
can be separated by a specified variable. Frequently a number of tests (a.k.a. markers 
or classifiers) are performed on each individual subject. A global univariate summary 
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of the corresponding ROC curve is used to determine which classifier is more accurate. 
A number of such summaries are available but the most commonly used one is the 
area under the ROC curve (AUG). The AUG can be interpreted as the probability 
that a randomly chosen diseased subject will have a marker value greater than that of 
a randomly chosen nondiseased subject and can be used as an alternative measure of 
difference between two populations (e.g. Zhou et al., 2002). Its range of apphcation 
extends from medical applications to reliability theory (Reiser and Guttman, 1986). 

The presence of ROG curves has become ubiquitous in medical studies (Metz, 
1989; Hsiao et al., 1989; Aoki et al., 1997; Otto et al, 1998; Stover et al, 1996; Zhou 
et al., 2002), its usage being spurred by the now classic text of Swets and Pickett 
(1982). Parametric and nonparametric methods for estimating individual ROG curves 
are available as well as methods that do not assume independent observations (Begg, 
1991; Belong et al., 1988; Molodianovitch et al., 2006; Pepe, 2003). 

In a large number of situations, additional information is available in the form 
of covariates which are known to influence the accuracy of the test. Only recently, 
statistical methods have been devised to incorporate such information in the ROG- 
based analysis. Some of the earlier methods have been produced by Thompson and 
Zucchini (1989), Obuchowski (1995), Tosteson and Begg (1988) and Toledano and 
Gatsonis (1995). Pepe (1997) formulated a general regression framework to model the 
dependence of the ROG curve directly on the covariates. Pepe (2000) and Dodd and 
Pepe (2003) propose semiparametric approaches to model the ROG and AUG directly 
using generalized linear models. Gai and Pepe (2002) extend the parametric ROG 
regression model by allowing an arbitrary nonparametric baseline function. Gai (2004) 
flnds a more efficient estimator in the semiparametric setting. Brumback et al. (2006) 
used an alternative procedure by applying a generalized regression framework directly 
to the AUG in order to adjust the Mann- Whitney test for covariates. However, this 
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approach loses the connection with the threshold value, does not allow the prediction 
of the sensitivity and specificity at a given threshold conditional on covariates nor 
does it model covariate effects on the individual marker values. Consequently we 
prefer to directly model the covariate effects on the marker values and through this 
modeling process obtain the analyses of interest. 

The methods proposed in this paper fall within the first category of methods de- 
scribed in Pepe (1998). We propose a nonparametric approach to adjust for covariates 
the computation of AUG and other ROC-related quantities of interest. The main mo- 
tivation for our method is the robustness to model mis-specification which may beset 
a parametric adjustment. We thus generalize in two ways the approaches of Faraggi 
(2003) and Schisterman et al. (2006) who use normal regression models to adjust the 
index AUG for covariates. We describe the regression model, distinguishing between 
the normal noise assumption and the general noise assumption. In a first extension 
of previous work, we estimate the mean and variance functions using nonparametric 
regression techniques, more specifically, local polynomial regression instead of para- 
metric linear models. Our main contribution leading to the second extension is to 
construct a covariate-adjusted Mann- Whitney estimator (CAMWE) in the general 
noise case, which relies on working samples created at any possible covariate value 
Z — z oi interest for the estimation of AUG. Such working samples have, for any 
Z — z, the same size as the original sample and can be used to estimate a number of 
covariate-adjusted characteristics of the ROG curve. In practice the computation is 
kept minimal by utilizing the estimated mean and variance functions for all Z = z of 
interest. We recommend bootstrapping in order to obtain confidence intervals for the 
covariate-adjusted AUG. Although we focus on covariate-adjusted AUG estimation, 
the proposed methods can be readily extended to other measures related to ROG 
curves, e.g., the covariate-adjusted specificity, sensitivity and Youden Index (Youden, 
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1950). 



A theoretical investigation provides asymptotic results for both the normal noise 
and general noise models. The asymptotic normality and optimal strong uniform 
convergence rates for the covariate- adjusted AUG estimators for normal noise are 
established. For the general noise distribution we first derive asymptotic normality 
of the "hypothetical" CAMWE and then characterize the asymptotic behavior of 
the Mean Squared Error (MSE) of the CAWME. We performed simulations under 
a number of scenarios to demonstrate the effectiveness and robustness of the pro- 
posed estimators as well as the validity of the Bootstrap scheme for confidence band 
construction. 

2 Model and Estimation 

2.1 Regression Model 

To motivate our proposal, we first note that parametric methods are used mainly for 
simple interpretation but may mis-specify the correct model forms, while nonpara- 
metric models provide an alternative solution and are more robust and data-adaptive. 
We attempt to achieve the robustness from two perspectives. First, we do not as- 
sume any parametric forms for the mean and variance functions of the test response 
variables, X for nondiseased individuals X and Y for diseased individuals. Although 
we refer to "diseased" and "nondiseased" groups, the above framework applies to any 
two populations of interest. We utilize nonparametric regression models 



X\Z^f{Z) + ./^)e^, 



(1) 



Y\Z^g{Z) + ^V2{Z) 62, 



(2) 
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where Z denotes the covariate, the standardized errors e\ and are independent 
of each other with zero mean and unit variance, and the variance functions < 
v\{z) < oo and < ^2(2;) < 00 for all 2; e 3f?. Note that the errors here can depend 
heteroscedastically on the covariate Z through v\ and v^- Second, we do not assume 
specific distributions for the noises in order to guard against mis-speciffication of 
error distributions. Denote the conditional cumulative distribution functions (c.d.f.) 
of X and Y given Z by F{-\Z) and and c.d.f.s of ei and £2 by -F*(-) and 

Here we assume F* and G* do not depend on Z, i.e., the dependence of 
X and Y on Z are expressed only through /, g, vi and V2, which is equivalent 
to a location-scale model. It is worth mentioning that, if the response variable is 
appropriately chosen ai Z = z, then marker values of the diseased sample should 
be greater than that of the nondiseased sample on average. This is equivalent to 
P{Y > X\Z = z) > 0.5, an assumption implicitly made for the remaining of the 
paper. If the baseline distributions F* and G* are symmetric about 0, it implies the 
assumption g{z) > f{z). In practice, we can simply constrain all the AUG estimators 
to be greater than 0.5. This would not affect any subsequent development due to the 
consistency of the unrestricted estimators as presented in Section 3. For notational 
convenience, we use the unrestricted forms throughout the paper. 

This extends the first type of models discussed by Pepe (1998), where linear 
forms were assumed for / and g with variances not depending on the covariate Z, 
i.e., g{z) — ao + 0!i + {a2 + Q:3)z, f{z) = ao + a2Z, vi{z) — vi and V2{z) — V2- It is also 
noticed that we do not require the same baseline distributions of the standardized 
error ei and €2 in contrast to Pepe (1998). Moreover, when the noise is not normally 
distributed, we shall propose a new estimator for the area under the ROC curve that 
extends the Mann- Whitney estimator for covariate-adjustment by using standardized 
residuals via the so-called working samples. 
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2.2 Estimation under Normal Noise Assumption 

Let A{z) be the area under the ROC curve with the covariate adjustment Z = z. 
From models ([I]) and ([2]), when the errors ei and €2 are normally distributed, i.e., 
F* = G* = $, where $(■) is the c.d.f. of the standard normal, it is straightforward 
to derive the following explicit expression: 

A^{z) =p{Y>x\z=z)=^ \^^^^^=^ - 

where the subscript "at" stands for the normal assumption. One can also obtain 
closed forms of the sensitivity qN{,z) and specificity Pn{z) for Z = z, 

[ VV2{Z) J [ vM^) J 

for a given threshold c. The ROC curve for the covariate Z = z is the plot of q{z) 
versus 1 — p{z) for all possible values of c, and this can be explicitly written as 

'g(z)-fiz) + ./M^<^-'{l-piz)} 



(5) 



The unknown functions f,g, vi,V2, are estimated by using nonparametric regression 
methods as addressed in Section 3.1, providing a "nonparametric adjustment" as 
discussed in Section 1. 



2.3 Estimation under General Noise Assumption 

The assumption of normal noise above simplifies the calculations of the AUC via 
([3]) but is not always supported by the data. In addition, the normality assumption 
hampers the full generality one expects from a nonparametric model. We propose here 
a fully nonparametric yet simple estimator of the AUC with covariate adjustment, 
A{z) = P{Y > X\Z = z), for a general noise distribution. 
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The proposed estimator is motivated by the classical Mann-Whitney statistic, 
which is formulated for two samples {xi, . . . , Xm} and {yi, . . . , as 

^ m n 

Mm,n = —J2Yl Mo,oc){yj -Xi), (6) 
i=l j=l 

where 1 [0,00) (2;) = 1 if 2; > and 1 [0,00) (2;) = otherwise. The data obtained from 
nondiseased and diseased samples consist of {{zi^^, Xi) : z = 1, . . . , m} and {{zj^y, yj) : 
j = 1, . . . ,n}, where Zi^^ is the observed covariate value in the nondiseased sample 
and Zj^y in the diseased sample. It should be noticed that the markers X and Y are 
evaluated at possibly different values of the covariate Z, and we are often interested in 
estimating A{z) even for z- values which were not measured in either group or both. To 
estimate A{z) at Z = z, one possibility is to include the marker values Xi and yj that 
fall into neighborhoods of z with appropriate weight functions. This consideration 
naturally leads to a bivariate kernel estimator that is fully nonparametric, 

2 ( ^ EI^l Ei=l l[0,°o)(l/i - Xi)Kh,{Zi,cc - z)Khy{Zj^y - z) 

Ei=i Ei=i KhAzi,x - z)Khy{zj,y - z) 
where and hy are bandwidths, Kh{-) = {l/h)K{-/h) when K{-) is a symmetric 
kernel density. However, Ak, does not efficiently use the available data due to the 
restriction on the local windows, nor do the regression models ([1]) and ([2]) play any 
role here. Note that A^ is obtained by smoothing the binary variables l[o,oo)(z/j ~ Xi) 
corresponding to covariate observations {zi^^, Zj^y) G [-2 — h^, z + h^] ^ [z — hy, z + hy] 
that are not necessarily located on the diagonal (in fact, {-Zi,^} and {zj^y} may have no 
overlap). It is unclear how to choose the bandwidths h^ and hy which are critical to 
the kernel regression estimation, as the standard cross-validation procedure does not 
apply due to the absence of the observed {zi^x^ Zj^y, l[o,oo)(2/j — ^i)) on the diagonal of 
the bivariate covariate surface. More discussion and comparisons concerning Ax{z) 
will be presented in simulations in Section 4. 
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Based on the above considerations, we propose a different nonparametric estimator 
of A{z) which utihzes the entire collection of data available and the regression models 
([T]) and ([2]). First, suppose that we can observe all the standardized residuals, i = 
l,...,m, j = l,...,n, 

_ — _ Vj — 9{^j,y) /o\ 

Recall that the distributions of ei and do not depend on Z, implying that ei^j are 
independently and identically distributed (i.i.d.) with the c.d.f. F* for z = 1, . . . , m, 
and are i.i.d. with the c.d.f. G* for j = 1, . . . , n. In Pepe (1998) these standardized 
residuals can be used to obtain the empirical distributions of ei and e2. In a similar 
sprit, we propose a different way to utilize these residuals to construct "working samples 
{xi^z, ■ ■ ■ , Xm,z} and {yi^z, ■ ■ ■ , yn,z} as if they were all observed at Z = z, 



Xi,z = f{z) + y^vi{z)ei^^, yj^z = g{z) + ^/v^ej^y. (9) 

Then it is intuitive to use the proposed Covariate- Adjusted Mann- Whitney Estimator 
(CAMWE) for A{z), 

^ m n 

Aniz) = XI XI l[o,oo)(%-,2 - Xi^z)- (10) 

i=i j=i 

This is a natural extension of the Mann- Whitney estimator since in the case of no 
covariate effect /, g, vi, V2 are constant in z and ( ITOi) becomes the traditional Mann- 
Whitney statistic. For practical implementation, after obtaining nonparametric esti- 
mates of f,g,vi and V2, we do not have to choose other tuning parameters for each 
covariate value Z = z, while (I7j) requires retuning. Analogously we can calculate the 
sensitivity and specificity from the working samples for Z = z, 

quiz) = - X l[0,oo)(%> > C), puiz) = — X Mo,oc)iXi^z < C), (11) 

" j=l ^ i=l 
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for a given threshold c. The ROC curves ioi Z = z can be obtained by plotting qu^z) 
versus 1 —Pm{z) for all possible values of c. 

Remark 1. Note that the central idea is to construct the working sample {xi^z-, ■ ■ ■ , Xm.,z} 
and {yi^z, ■ ■ ■ ,yn,z} for each Z = z. The entire conditional ROC curve, given the co- 
variate value Z = z, can be obtained from (fTTj) . One can estimate any index of 
interest at Z = z using this working sample. For instance, the Youden Index (YI) 
(Youden, 1950) can be calculated by YIm{z) = Pm{z) + qu^z) — 1, where Pm{z) and 
qu^z) are defined by ffTTj) . and its optimal threshold given Z = z can be found via a 
numerical search. 

Remark 2. In principle, the proposed approach can be extended to the case of 
multiple covariates using different strategies. A natural consideration is to use multi- 
variate nonparametric smoothing techniques that require extensive computation. An 
alternative is to use additive frameworks for mean and variance structures respec- 
tively, then construct the working sample in a similar spirit for each set of covariate 
values of interest. 

2.4 Implementation via Nonparametric Regression 

We exploit the local polynomial regression models for estimating the functions / and 
g. Let K{-) be a compactly-supported symmetric kernel density function with a finite 
variance, hi = hi{m) a sequence of bandwidths used to estimate /, and /i2 = h2{n) a 
sequence of bandwidths for g. Let p be the order of local polynomial fit, e.g., p = 
and p = 1 correspond to local constant and local linear fits, respectively. An odd 
order fit is often suggested (Fan and Gijbels, 1996) for both theoretical and practical 
considerations. In particular, for estimating the regression function itself, a common 
choice is the local linear fit with p = 1. Denote the resulting pih order local polynomial 
estimators of f{z) and g{z) by f{z) and g{z). Next, the variance functions vi{z) and 
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V2{z) for heteroscedastic errors are estimated by fitting local polynomial regression 
to the squared residuals, Vi^x and vj^y, i = 1, . . . ,m, j = 1, . . . ,n, 

Vi,x = {xi - fizi^x)y, Vj^y = {yj - gizj^y)y, (12) 

with bandwidths bi = 6i(m) and 62 = ^2(^)- The detailed formulas of the afore- 
mentioned local polynomial estimators are given in Appendix 1. In the case of ho- 
moscedastic errors, vi{z) = vi and V2{z) = V2, it is easy to obtain root-n consistent 
estimators (Hall and Marron, 1990; Hall et al., 1990). The theoretical properties in 
Section 4 are still valid with slight modifications. In practice, the bandwidths hi, 
/12, bi and 62 are chosen by the standard technique of leave-one-out cross-validation 
for estimating the mean and variance functions, while other existing techniques can 
certainly be applied. Such bandwidths usually fulfill the assumptions needed for the- 
oretical developments in Section 3 for sufficiently large sample sizes. Substituting the 
local polynomial estimators f{z), g{z), vi{z) and V2{z) for these unknown quantities 
in formulae (l3])-(l5l), (fTOl) and (fTTll provides the point estimators An{z), Pn{z), qN{z), 
^Aiiz) Pm{.z) and QMiz) for covariate Z = z. 

To evaluate confidence limits and variances for AUG under normal noise, the ex- 
isting formulation (Guttman et al., 1988; Faraggi, 2000, among others) are no longer 
valid due to nonparametric regression. In principle we can derive the approximate 
variance for AUG under normal noise, based on the asymptotic normality of the local 
polynomial estimators (Fan and Gijbels, 1996) using the Gramer-Wold device. How- 
ever, due to the complicated asymptotic expressions with unknown functionals and 
their derivatives, the evaluation of such asymptotic quantities will require extensive 
pilot smoothing and further approximations. This might deteriorate the accuracy and 
not be worth further pursuing. Thus we choose to obtain confidence limits and vari- 
ance estimates for AUG via "bootstrapping the original data" as proposed by Efron 
and Tibshirani (1993). We do not repeat the procedure here for conciseness. While 
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this approach can be justified in normal noise case due to the hmiting distributions in 
Theorem [H it may not be the case under the general noise for which the asymptotic 
normality of the CAMWE Am{z) is unknown at this moment. Nevertheless, the sim- 
ulation performed in Section 4.1 offers empirical support to this bootstrap procedure 
for the general noise case. 

Remark. Jointly choosing four bandwidths simultaneously aiming at the AUG 
estimator is prohibitively expensive, even impossible with available computing re- 
sources. Even if the computation load were not an issue, we would have no suitable 
criterion to perform the joint optimization for two reasons. First, if one bases the 
criterion on asymptotic bias and variance, these quantities involve unknown function- 
al and their derivatives and are too complicated for practical use. It should also be 
noticed that such asymptotic expressions are established only for the normal noise 
case. Second, if one attempts cross-validation directly for A{z), there are no observed 
values of AUG aX Z = z available, which is a similar issue as the one discussed for 
Ak in Section 2.3. 



3 Theoretical Properties 

In this section we present the asymptotic theory developed for the nonparametric 
estimators of the AUG with covariate adjustment ioi Z = z under both normal and 
general noise assumptions. One can easily extend these arguments to obtain the 
corresponding asymptotic theory for the sensitivity q{z) and specificity p{z) with a 
given threshold value c. These are not presented here for conciseness. 
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3.1 Asymptotic Properties under Normal Noise 

We begin with the asymptotic normahty of the estimated AUG under the normal 
noise assumption, where the target A{z) is exactly Aisf{z), i.e., A{z) = An{z). Let 
9{z) be the density function of the covariate Z that is treated as a random variable. 
Denote by N{z) a neighborhood of z. Assume that, for a given value z of Z, 

(Al) 9{z) > and 9{-) is continuous in N{z). 

Put r)i{z) = E{el\Z = z), r)2{z) = E{el\Z = z), ki{z) = Var(e?|Z = z) and K2{z) = 
Yeir{el\Z — z). Assume that, for a given z, 

(A2) vi{z) > 0, f^'^^\-),v^^'^\-),r]i{-) and ki{-) are continuous in N{z). 

Recall that hi — /ii(m), bi — 6i(m), /i2 = ^2(?T') and 62 = &2(?T') are the sequences 
of bandwidths for estimating f{z), vi{z), g{z) and V2{z). One can see that, if the 
bandwidths hi and 61 are chosen optimally for estimating f{z) and Vi{z), then hi 
and 61 will be of the same order in terms of the sample size m. Thus we assume the 
following, as m ^ 00, 

(A3) hi 0, mhi 00, mh^'^^ d\ for some di > 0, bi/hi — > pi for some 
< pi < 00. 

Analogously, for the estimation of g{z) and V2{z), we assume that, for a given z, 

(A4) V2(^) > 0, g''^^^\-) , v^'^^\-) , f]2{-) and fi;2(-) are continuous in N{z)] 

(A5) h2 0, n/i2 — >^ 00, nhi^^^ — > ^2 some d2 > 0, and 627/^2 — ^ P2 for some 
< p2 < 00. 

Here we consider the odd order p of local polynomial estimators for f,vi,g and V2 
as argued in Section 2.4. The same order p is used mainly for notational convenience. 
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while we certainly can choose different orders in practice. With slight modifications, 
the results can be easily adapted to possibly different orders as well as the case of 
even p. For the symmetric kernel density K[-) we assume that the j-th moment 
Hi{K) — J u^K{u)du exists for all integer j > 0. 

(A6) R(K) = / K\u) < oo, ii2(K) > 0. 

For convenience, we introduce the notion of the order of a kernel function. We 
say Kq is an ith order kernel function, provided that /io(-K^o) = Ij fJ-ji^o) = for 
j — 1, . . . ,i — 1 and i1(,{Kq) ^ 0. It is obvious that K{-) is a 2nd order kernel. Let 
the (p + 1) X (p + 1) matrix Sp — {A*j+K-^)}o<j,i<p, be the (p + 1) x 1 vector with 
the fcth element equal to 1 and elsewhere, and 

K*{u) = elS^\l,u,...,u^fK{u), (13) 

which is often referred to as the equivalent kernel. One can verify that K*{-) is a 
(p+ l)th order kernel when p is odd. Also denote R{K*, p) — J K*{u)K*{u/ p)du for 
any < p < oo. 

Lemma 1 in Appendix 2 provides the joint asymptotic distributions of the local 
polynomial estimators of {f{z),vi{z)}'^ and {g{z),V2{z)}'^ , which is the basis for 
deriving the asymptotic distributions of Aisf{z). The difficulty in the proof of Lemma 
1 is to deal with the dependence between the mean and variance estimators, while 
{f{z),vi{z)}'^ and {g{z),V2{z)}'^ are independent, see Appendix 2 for details. Based 
on Lemma 1, we exploit the Cramer- Wold device to obtain the asymptotic distribution 
oi An{z) as follows. 

Theorem 1 Under the assumptions (Al)-(A6) for a given z, 

• if n/m —>■ oo, ^/rnhl{AN{z) — An{z)} N{Bi{z),Vi{z)}, where 4>{u) = 
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(27r)-V2e-V2^ = {g{z) - f{z)}/y/v,{z)+v,{z), 

(t){5{z)}^L^+,{K*)d^ 



{p+l)\^Vi{z)+V2{z) 



■' 2{v,{z)+V2{z)} 



^ {g(z)-/(z)}it:(X*,pi)77i(z) ^ {g(z)-/(z)rit:(K*)ACi(z) 
{^;i(z) + V2{z)}pi 4.{vi{z) + V2(z)ypi 

ifn/m 0, ^/nh^{AN{z) - An{z)} N{B2{z),V2{z)}, where 

(f){6{z)}fip+i{K*)d2 



B2{z) 



{P+I)\yjv^{z)+V2{z) 



_ {9iz)-f{z)}vi'^'\z)p'i'' 
^ 2{v,{z) + V2{Z)} 



{^(z) - f(z)}R{K*,p2)v2(z) ^ {g(z) - /(z)pit:(K*)^2(^) 

{Vi{z) +V2(z)}p2 4.{vi(z) +V2(z)yp2 

ifn/m — > X for some < X< oo, ^/mhi{AN{z) —An{z)} A'"{S3(2;), ^^(2;)}, 
where 

p+1 2p+2 

S3(^) = Bi{z) + \-^^B2{z), V^{z) = Vi{z) + A"^l/2(^) (16) 



Besides the pointwise limiting distributions, we also establish the optimal rates 
for strong uniform convergence of Ajv in Theorem 2. Denote by Z the set of possible 
values of Z (usually an interval on the real line). Additional assumptions below are 
needed for the uniform convergence results, 

(A7.1) £'(|X|*) < oo,sup^g2 / \A'^P{z,x){z,x)dx < 00 for some s > 2, where P{z,x) is 
the joint density of {Z, X). 

(A7.2) E{\Y\') < 00, sup^^z J \y\^P{z,Y){z,y)dy < 00 for some s > 2, where p(z,y) is 
the joint density of {Z, Y). 
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For the proof of Theorem 2 we need to modify (Al)-(A6) as follows. For convenience 
we impose conditions on the equivalent kernel K* (|T3l) instead of the original kernel 
K. 

(Al"'') 0{-) > 0, and ) is bounded and continuous on Z. 

(A2''') On the domain Z, vi{-) > 6i for some 6i > and is bounded, /(■) is bounded, 
t>|^'*'^''(-), ?7i(-) and ki(-) are bounded and continuous. 

(A3''') Ylm^t^ ^ °° some Ai > 0, m?P^'^^hi oo for some pi < 1 — where 
s > 2 satisfies (A7.1). 

(A4'l') On the domain Z, V2{-) > 62 for some ^2 > and is bounded, g{-) is bounded, 
g^'P~^^\-), v^^^\-), ?72(-) and K2(-) are bounded and continuous. 

(A5''') J2n^2^ < ^ f*^^ some A2 > 0, Tn?P'^^^h2 — * 00 for some p2 < 1 — , where 
s > 2 satisfies (A7.2). 

(A6^) K* is uniform continuous, absolutely integrable with respect to Lebesgue mea- 
sure on 3? and of bounded variation, K*{u) — > as |m| ^ 00, / {|Mlog(|u|)|}"'^/^|(i-ft'*(M)| < 
00. 

Lemma 2 in Appendix 2 presents the strong uniform convergence rates of the local 
polynomial estimators of the mean and variance functions. Then the strong uniform 
convergence rate of Ajy is obtained immediately below, where a.s. is the abbreviation 
of "almost surely". 

Theorem 2 Under the assumptions (AP )-(A6^ ), (A7.1) and (A7.2), let = /i^^V 
■v/log(l//ii)/ (mhi) and cj„ = h'^'^^ + ^y\og{l/h2)/{nh2), then 

sup \An{z) - An{z)\ = 0{Tm + oJn) a.s. (17) 
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3.2 Asymptotic Properties under General Noise 

Now we turn to the asymptotic properties of the CAMWE Anj{z) of A{z) under 
the general noise assumption. We first state the asymptotic normahty of the "hypo- 
thetical" estimator Am{z) (ITOI) that contains true values of the unknown mean and 
variance functions, while our target is A{z) = P{Y > X\Z = z). Recall that F* and 
G* are the c.d.f.s of standardized errors ei and 62, and do not depend on the covariate 
Z. Define 

I V V2{Z) ^V2{z) 



vx{z) ^vx{z) 
Set ^i^o(^) = var{/ii,o(ei; z)} and ^o,i(^) = var(/io,i{e2; z)}. 

Theorem 3 For the regression models (QP and ^ and a given z, 

E{Am{z)} = A{z), var{AM{z)} = O {^—] ■ (18) 



If n/m ^ X for some < A < 00, ^i^o{z) > and ^0,1 (-2^) > 0; 



en 



V^^rT^{AM{z) - A{z)} N jo, 
where X* = 1/(1 + A). 



+ 1^ ^ ' ^^^^ 



In the next theorem we establish the MSE consistency of the CAMWE Am{z) for 
the "hypothetical" estimator Am{z) for a given covariate Z = z, based on uniform 
consistency of the estimated mean and variance functions. It is noticed in the proof 
that we actually do not need the optimal strong uniform convergence rates stated 
in Lemma 2, as these rates cannot be passed to A{z), while uniform consistency 
in probability is sufficient. Thus the regularity conditions (A3^) and (A5^) can be 
relaxed to the following. 
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(A3*) hi 0, mP^hi oo for some pi < 1 — s^^, where s satisfies (A7.1). 
(A5*) /i2 — ^ 0, n^'^h2 — > oc for some p2 < I — s~^, where s satisfies (A7.2). 

We also need the following additional assumptions, 

(A8) F*{-) and G*{-) are continuous on their domains. 

Theorem 4 Under (A8) and the assumptions for Theorem 2 with (A3\) and (A5]) 
replaced by (A3*) and (A5* ), for a given z, 

E\{Am{z)- AM{z)f\^^. (20) 

We conclude this section with the following corollary that is a direct consequence of 
Theorem [3] and |H Note that the MSE discrepancy between estimated and true AUG 
at Z = 2; is dominated by the nonparametric rate in (|20l) that is usually slower the 
the parametric rate (m + n)~^/^, although its order of magnitude is not obtainable, 
at least to our knowledge. 

Corollary 1 Under (A8) and the assumptions for Theorem 2 with (A3]) and (A5]) 
replaced by (A3*) and (A5* ), for a given z, 

E[{Am{z)-A{z)Y]-^Q. (21) 

4 Simulations and Data Example 
4.1 Simulations 

The purpose of the simulations is to assess the performance of the methods for esti- 
mating AUG in nonparametric regression settings. We have not compared our method 
with parametric models since the two approaches address different situations. If a 
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parametric model is correctly specified, its performance will be superior to a nonpara- 
metric procedure; however, if there is no known parametric model suitable for the 
data considered, one will have no choice but to use the nonparametric tools available. 

We consider three situations for illustration. In the first situation the underlying 
models are, for non-diseased and diseased individuals respectively, 

Vj = 6 + l.bzj,y + 1.5 sm{zj^y) + ^ Zj^y - 0.5 + \Jv^{z~) ej^y, (22) 

where the errors ei^x and ej^y are standard normal, the conditional variance functions 
are vi{z) = Q.3+^{2z-6) and V2{z) = 1.5 + $(2;z-6), i = 1, . . . , m, j = 1, . . . , n. The 
covariates Zi^x and Zj^y are independently generated from U[l, 5], and moderate sample 
sizes n — m — AO are used. The identical setting is used in the second situation, except 
that the errors ei^x and ej^y are generated from a Student-i distribution with 3 degrees 
of freedom and rescaled to have zero mean and unit variance. 

The third situation, in which the log-transformed responses have normal errors 
e*.j. and e*,y, i.e., the responses are generated from log-normal models, is designed to 
demonstrate the robustness of the proposed CAMWE Am{z). Since a log-transform 
often stabilizes the variability, we assume a constant variance on log-scale for both 
groups. Let /o(-) and go{-) be the mean functions on log-scale, while /(•) and g{-) 
correspond to the original scale. Prom the properties of the log-normal distribution, 
one has 

log{/(^)} = Mz) + aV2, v^{z) = {e'^' - l)f{z) 
log{g{z)} = go{z) + aV2, V2{z) = {e^" - l)g\z). 

We choose f{z) = 1 — 0.5^; — 0.25 sin(7r2;) and g{z) = 1 — 0.5^; — 0.25 sin(7r2;) + 
1.5^/z + 0.5, z e [0,1], and = 1/3. Then the models are completely determined 

19 



and can be written as 

Xi = exp{/o(2i,^) + ae*^}, = exp{c/o(2j,2/) + (Te*y}, (23) 

where the covariates Zix and independently generated from t/[0, 1], e*^ and 

e* y are standard normal errors, i = 1, . . . , m, j = 1, . . . , n. 

With the generated data we compared three estimators, Aj^{z) with normal noise 
assumption, CAMWE Am{z) with general noise assumption as well as the kernel 
estimator Ak{z). For bandwidth choices, recall that joint selection aiming for An{z) 
and Am{z) is not feasible and that cross-validation fails for Ak{z). To make the 
comparisons possible, for An(z) and Am{z) we minimized the true integrated squared 
errors respectively, say J {f{z; hi) — f{z)}^dz to select hi, and similarly for /i2, &i and 
62, while / {Ak{z; h^, hy) — A{z)Ydz was minimized for choosing and hy in Ak{z). 
One can see that, if one targets at A[z), the bandwidths chosen for Ai^[z) and Am{z) 
may not be as "optimal" as those for Ak{z). However, it is demonstrated below 
that even in such a disadvantageous situation, the proposed estimators, especially 
Am{z), are still preferable. We used the sample sizes oi n — m — 4Q and n — m — 
100, while all the estimates were improved with increased sample sizes as expected. 
All three AUG estimates are obtained by applying the estimation procedures to the 
simulated data {{zi^x-iXi)}i=i^___^rn and {(-^j.y, yj)}j=i,...,n (on original scale throughout) 
in the aforementioned three situations. Monte Carlo averages (calculated from 500 
runs in each case) of Mean Squared Errors at different values of z are presented in 
Figure 1. We can see that, for the normal noise model the CAMWE Am{z) and 
normal estimator A^{z) are comparable and both outperform the kernel estimator 
Ak{z). Although Ak{z) improves upon Ajq[z) under the heavy-tailed Student-t noise 
model, the CAMWE Am{z) is still the most effective. For the log-normal model, when 
we apply these three estimation procedures to the original responses, the CAMWE 
and kernel estimators yield comparable results (CAMWE seems slightly better), and 
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both significantly improved upon the normal estimator. 

Now we examine the empirical performance of the pointwise confidence bands and 
variance estimates obtained by "bootstrapping the data" in the general noise case, i.e. 
when the CAMWE Am{z) is used for estimation, we carried out an additional study. 
We used the same settings for the three models with normal. Student with 3 degrees 
of freedom and log-normal noises, respectively. The benchmark used for comparison is 
the 95% pointwise confidence bands and variance estimates averaged from 500 Monte 
Carlo runs. In each Monte Carlo run, Am{z) was obtained and we bootstraped the 
data 1000 times to calculate 95% bootstrap bands (defined between the 2.5th and the 
97.5th percentiles) and bootstrap sample variance. All the bandwidths involved in 
the estimation are selected respectively by leave-one-out cross-validation in smoothing 
steps. In the top panels of Figure [2] we reported, for all three data-generating models 
with moderate sample sizes n = m = 40, the comparisons between the Monte Carlo 
averages of the bootstrap bands and the Monte Carlo bands. In the bottom panels, 
similar comparisons were shown for the averaged bootstrap variance estimates of 
Am{z) against the Monte Carlo variances. From Figure Ej for the CAMWE Am{z), 
the averages of confidence bands obtained by "bootstrapping the data" approximate 
well the 95% pointwise Monte Carlo bands. The same can be said about the averages 
of bootstrap variance estimates. This provides some empirical evidence for using the 
bootstrap confidence bands and variance estimates for the CAMWE Am{z) in the 
general noise case. For the normal noise model, we have done similar comparisons 
and the results are almost identical to those obtained for A]\i(z) (thus not reported 
for brevity). 
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4.2 Real Data Example 

We consider the white onions data originally reported by Ratkowski (1983) on the 
density-yield relationship of varieties of white Spanish Onion grown in various regions 
of Australia. The data has been the subject of a nonparametric analysis of covariance 
in Young and Bowman (1995). One can see from Figure [3] that the relationship be- 
tween the density and yield is non-linear for the two regions considered here: Virginia 
and Purnong Landing. A question of interest is whether the two regions of origin for 
the onions can be separated simply by looking at the yield. Figure [3] shows that the 
difference between yields depends on the density which will be the covariate under 
consideration in our study. 

If we apply directly the method of Faraggi (2003) to the data on the original scale 
we observe a large discrepancy between the parametric and nonparametric analyses, 
as illustrated by the top panel in Figure HI We also notice that bootstraping the 
data produces wider 95% confidence bands for large values of the density due to the 
sparseness and high variability. But even such confidence bands do not cover the 
parametric estimators of the AUG. We should note that due to the sparseness of 
observations with densities larger than 150 we focus on the covariate range (0, 150). 
On the logarithmic scale, the relationship between yield and density is more linear 
as can be seen from the bottom panel in Figure [31 In addition, the transformation 
seems to stabilize the variance so it is not unexpected that he difference between the 
nonparametric approach and the parametric one diminishes. We can also notice that, 
on both original and logarithmic scales, the estimates obtained under the normal 
assumption are more conservative indicating a smaller AUG for small densities. This 
indicates that the normal assumptions may not be valid for this dataset and that the 
nonparametric approach is more suitable due to its robustness. 
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5 Conclusions 



We introduce nonparametric adjustment for covariate information in the context of 
ROC analysis, more specifically for the AUG index. The essential idea in our proposal 
is that the conditional ROC curve and all the indexes associated with it (e.g. Youden 
Index (YI) and its optimal cutoff value) can be computed using the statistical model 
and, subsequently, the reconstructed working sample. The theoretical properties of 
the index estimators deserve further investigation. The approach bears some sim- 
ilarity to the work on nonparametric adjustment for covariates when estimating a 
treatment effect as in Young and Bowman (1995) and Cantoni and de Luna (2006) 
and advances in that field are likely to yield newer results for the ROC covariate 
adjustment. In contrast to their work we focus on a generalized Mann- Whitney ap- 
proach. Our simulations demonstrate effectiveness and robustness of the proposed 
method. While the discussion is hmited to the case of only one covariate, the pro- 
posed approach can be extended to multiple covariates in various ways (e.g.; additive 
models). It is expected that the computational load will significantly increase with 
each additional covariate added to the model. In principle one may consider rea- 
sonable parametric approximations suggested by nonparametric approaches that lead 
to simpler interpretations. For instance, one possibility is to use parametric mod- 
els for the mean and variance functions following the nonparametrically estimated 
forms. Similar strategy applies to approximating the empirical c.d.f. of the noise by 
parametric functions. 
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Appendix 1: Local Polynomial Estimators 



Recall that {{zi^^, Xi)}i<i<rn and l/j)}i<j<n, are nondiseased and diseased sam- 

ples. The local polynomial regression estimator of f{z) is obtained by minimizing 

m p 



^{Xi - ^PkiZi^x - zfYKh^{zi^x - z), 



(24) 



fc=0 



where hi = hi{m) is the bandwidth controlling the amount of smoothing, and 
Kh-^{-) = K{-/hi)/hi. It is more convenient to work with matrix notation. Denote 
the design matrix of by Zr^, 

f 

Zx = 



1 {Zl,x - Z) 



{Zl,x - z) 



1 ^^m^x ^) 



Y I 



and put Wx,hi = diag{-ft'/j^(zi_^ — z) : z = 1, . . . , m} and x = {xi, . . . , x^Y ■ The local 
polynomial estimator is then given by 



f{z) = e^{ZjW^,hM-'Z,W,,h^x. 



(25) 



Analogously for the diseased sample {zj^y,yj),j = 1, . . . ,n, the design matrix Zy and 
weight matrix Wy^h2 are similarly defined, letting y = {yi, . . . ,y)'^ , then the local 
polynomial estimator for g is g{z) = e{{ZyWy^h2Zy)^^ ZyWy^h^zV ■ 

We next estimate the variance functions vi{z) and V2{z) for heteroscedastic errors 
according to models ([T]) and ([2]). The nonparametric estimators "01(2;) and V2{z) 
are obtained by fitting local polynomial regression to the squared residuals, i.e., the 
variance observations Vi^x and Vj^y as in f|T2|) . Let hi = bi{m) and 62 = &2(^) be 



,X} • • • 1 '-'m,x J 



the sequences of bandwidths for vi{z) and V2{z). Denote = (fi,.T v„ 

'"y = • • • ' ^n,yV, we have 



and 



ViiZ 



e^{Z^Wx,b,Zx)-'ZxWxMVx, hiz) = el{ZlWy^b,Zy)-^ZyWyf,,Vy, 
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where and Zy are defined as the above, Wx,h^ = diag{Kf,j^{zi^x — z) : i = 1, . . . , m} 
and Wy^b2 = diag{i^'fe2(2:j- - z) : j = 1, . . . ,n}. 



Appendix 2: Auxiliary Results and Proofs 

Lemma 1 If the assumptions (A1)-(A3), (A6) hold, and m —>■ oo, for a given z, 

V^iih^) - /W, M^) - Mz)f ^ N{b,iz), (26) 
where bi{z) = {bu{z) , buiz)}'^ and T.i{z) = {aj.,jj(-2)}i<jj<2 with 

R{K*)v,{z) R{K*)k,{z) R{K*,pMz) 

0-x,ll[Z) = ^1—^ , 0-x,22[Z) = ^1—^ , 0-x,l2[^) ^ 



e{z)p, ' ^'^^^ ' e{z)p. 

Analogously, if the assumptions (Al), (A4)-(A6) hold, and n oo, for a given z, 

V^{^(^) - 9{z)Mz) - V2{z)Y ^ N{h2{z), S2(^)}, (27) 
where b2{z) = {&2i(-2), &22(-2)}^ and ^2(2;) = Wy,ijiz)}i<ij<2 with 

h^z) = ^^^^d^g^r^-H^), h2{z) = ^^^f^d2pl-'\t'\z), 

R{K*)v2{z) ^ ^ R{K*)k2{z) , , R{K\p2)i]2{z) 



(yy,\\\Z) = ^7^; , 0-y^22[Z) = , 0-y^i2[Z) 



e{z) ' e{z)p2 ' '^'-'^ e{z)p2 ' 

Proof of Lemma CI The asymptotic normahty of f{z) with the bias 611 and the 
variance a^^u is standard in local polynomial regression. Let f*^ = {x, — /(2;j ,j.)}^, 
note that the input data Vi^^ = {xi-f{zi^x)V = ^Ix + H^i- fizi,x}{f{zi,x- f{zi^x)} + 
{f{zi,x)-f{zi^x)V- Applying a local polynomial fit to {zi^x, Vi,x),i = 1, . . . , m, one can 
see that the second term will result in a quantity of the order Op{b^'^^ + l/\^mbi) and 
the third term will yield Op{hl^^^^^ + l/{mhi)}. It is obvious that both quantities 
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are ignorable, compared to the local polynomial estimator vl{z) obtained by fitting 
{zi^x,v*^). Therefore the estimators vi{z) and vl{z) are asymptotically equivalent 
with the same limit distribution. Again we apply the standard argument of local 
polynomial regression to obtain the asymptotic normality of vi{z) with the bias 612 
and variance <Jx^22- To derive the covariance of the limit distribution between f{z) 
and vi{z), one can equivalently work with f{z) and v*{z). Using the equivalent 
kernel notation K*, the limiting covariance is identical to the following, obtained by 
employing a Taylor expansion, 

cov{f{z)-f{z),Mz)} = ^^J^^^^i J K*{u)K*{u/p,)du^,{z)+0{h)}. 
where 

^ m _ m 

The same arguments can be applied to obtain the joint asymptotic distribution in 

(EZD- 

Proof of TheoremUl The Cramer- Wold device is exploited to derive the asymptotic 
distributions of An{z) for three possible cases, and the detailed proof is omitted for 
conciseness. 

Lemma 2 If the assumptions (Al^ )-(A3^ ), (A6^) and (A7.1) hold, and m 00, 

sup \f {z) - f{z)\ = 0{Tm), sup \vi{z) - Vi{z)\ = 0{Tm), W.pA., (28) 

and If the assumptions (Af^ ), (A4^ )-(A6^ ) and (A7.2) hold, and n ^ oo, 

sup \g (z) - g{z)\ = 0{ujn), sup\vi{z) - Vi{z)\ = 0{uJn), W.pA, (29) 

where Tm = h^'^^ + A/log(l//ii)/ {mhi) and Un = h^2^^ + ^J\og(l / / {nh2) as defined 
in Theorem 2. 
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Proof of Lemma 2. It is sufficient to show fl28l) . The strong uniform convergence rate 
Tm for / was obtained by Horng (2006), which is based on the arguments in Silverman 
(1978) and Mack and Silverman (1982) and the equivalent kernel representation, 
we follow the similar argument used in the proof of Lemma 1. Recall that v*^ = 
{xi - f{zi^x)V, and fj,^ = {xi - /(^i,^)}^ = <x + H^i - - /(^i,x)} + 

{f{zi,x) - f{zi,x)V- Applying a local polynomial fit to {zi^x,Vi,x), i = I, . . . ,m, the 
second and third terms of the resulting estimator tend to with probability 1, and the 
leading term has the strong uniform convergence rate by using the same argument 
for /. 

Proof of Theorem 2. The proof follows Lemma 2 and the uniform version of Slut- 
sky's Theorem. It is only needed to note that, if (A2''") and (A4"'') hold, A^r = 
$(/, (7, f 1, ^2) has bounded partial derivative in each argument, and thus satisfies 
Lipschitz continuity. 

Proof of Theorem 3. For a given Z = z, one can see that "hypothetical" esti- 
mator Am{z) is in fact a two-sample U-statistic. The argument used in the theory 
of U-statistics can be applied here. The unbiasedness of Am{z) is obvious. For the 
asymptotic variance at a given z, put h{X^Y;z) = l[o^oo)(X — X\Z = z) — A{z), /?-oo = 
E{hiX,Y;z)} ^ 0. hl,iX;z) = E{hiX,Y; z)\X}, hl,iY;z) = E{hiX,Y; z)\Y}. 
Note that 

hl^{Y;z) = P{Y >X\Y,Z = z) 

= P (j{z) + ei^/vi{z) < g{z) + e2y^V2iz) ea) 

£2 = hifi{e2;z), 



viiz) ^/vl{z) 

and similarly hl oiX; z) = /ii,o(ei; z), i.e., ^ = var{/i^_o(r; 2;)}, = var{/i^ ^(F; z)} 
as specified in Theorem 3. The unbiasedness of Am{z) is obvious from /iqq = 0. For 
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the variance calculation, after some counting techniques, one has, 

var{AM(z)} = — E E ClCr-'C}C^:^^^, ^ ^U^ ^ 0^ + o (^) , 

(30) 

where is the combination of choosing k from n. This proves (fTSi) . 
To show the asymptotic normality ( |T9l) . define 

f 1 I 1 

Tm,n(2:) = Vm + n < — /it 0(2;*,^) + - KAyj,z) \ , 
m ^ — ^ n ^ — ^ 

L i=i j=i ) 

which is in fact the projection of ^/m + n{AM{z) — A{z)} on the space formed by 

random variables of the form of {X^I^i ^^(^j,^) + X]j=i where ip and ip* are 

arbitrary measurable functions. From Hajek's Projection Theorem and fl5U]) . we have, 

as m, n — > 00, 

vai{y/mTnAM{z) - Tm,n{z)} = vai{y/TnTnAM{z)} - var{Tm_„(2;)} — > 0, 

which, together with unbiasedness, implies that i/m + n{AM{z) — A{z)} is asymptot- 
ically equivalent to Tm.n{z)- Then following central limit theorem, when n/{m + n) — *■ 
A* and min{^^Q(z), ^^(2;)} > 0, T^m^z) has the limiting distribution as specified in 
(fT9|) . So does ^m + n{AM{z) - A{z)}. 

Proof of Theorem 4- Define Wij = yi^z — Xi^z and Wij = yi^z — Xi^z, and the depen- 
dences of Wij and Wij on Xi^z,yj,z, Zi^^, Zj^y and z are suppressed for simphcity. Let 
ai{z) = g{z) - f{z), a2{zj,y,z) = ^V2{z)lv2{zj^y,z), az^Zi^x-.z) = -^/vi{z)/vi{zi^^), 
a4{zj.y,z) = -g{zj,y)a2{zj.y,z), a^{zi^x,z) = -f{zi^x)a3{zi^x,z), and then 

Wij = ai{z) + a2izj^y, z)yj + 03(2;^,^, z)xi + a^izj^y, z) + 05(2;^^^, z), 

Wij = di{z) + d2{zj^y, z)yj + a2,{zi^x, z)xi + d^izj^y, z) + d^{zi^x, z), 

where " " " is the generic notation for estimated quantities. By analogy to the proof 
of Lemma 2 with the assumptions (A3^) and (A5^) replaced by (A3*) and (A5*), 
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we obtain weak (in probability) uniform consistency of /, g, vi and V2- This is 
sufficient for our purpose, the reason of which will be singled out below. Again 
by analogy to the proof of Theorem 2 with the uniform version of Slutsky's Theo- 
rem (in probability instead of almost sure), we have, for a given z, ai{z) cii{z), 
sup^^ Jakizj^y,z) -ak{zj^y,z)\ = Op(l), sup,^ Jai{zi^^, z) -ai{zi^^,z)\ = Op(l), for 
k = 2,4: and / = 3, 5. Since ei^i F*, one has ei,j = Op(l)and, analogously, 

e2j = Op{l), regardless of i and j. Also note that /, g, vi and V2 are bounded on Z, 
then we obtain sup^j ... ^ \wij — Wij\ = Op(l) that only depends on the given z. 

To show ([20]), we observe that E[{Am{z) - Am(-2)}^] = ^o,o + ^i,o + ^o,i + £"1,1, 
where 



"^O'O ^ ^2^2 ^ [0, 00) (Wij) - l[o,oo){Wij)}{l[o,oo){'Wi'j') - l[0,oo){Wi'j')} 

while Ei^Q, i?o,i ^-iid Ei i are defined in the same way, with Ei^ corresponds to 
Eo^i to Xli^j',j=j' and Ei^i to Y^^^^.j^j,. We first focus on Eo,o, 

£"0,0 = {^(^ij ^ 0' ^i'i' ^ 0) + ^ 0' ^ 0) 

—P{wij > 0, > 0) — P{wij > 0,Wi'j/ > 0)| 
< sup P{wij > 0, Wj/jv > 0) + P{wij > 0,Wi'jf > 0) 

-P{w^j > 0, w^f > 0) - P(w,j > 0, w.'f > 0) . (31) 

For any given z, from Slutsky's Theorem, we have (wij,Wi/j/)'^, {wij,Wi/jr)'^ and 



converge in probability to {wij,Wi/ji)'^ uniformly in all arguments except 



z, which implies uniform convergence in distribution. Therefore the four sequences 
of probabilities in flHT]) all uniformly converge to P{wij > 0,Wi'j' > 0) as 171,11 —>■ 00, 
which leads to i?o,o ^ 0. From the above argument, one can see that the weak uni- 
form consistency is sufficient, also that the convergence rates cannot be preserved for 
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evaluating upper bounds for those probability differences. Using similar arguments, it 

is easy to show that Ei^ = O^Eo^o/m), £^o,i = O^Eq^q/ti) and Ei^i = OlEo^o/^mn)}. 
This completes the proof of Theorem 4. 
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Figure 1: Top Row: Simulation results for the three models with normal (left), 
Student-t with 3 degrees of freedom (middle) and log- normal errors (right). Shown 
are Monte Carlo averages of Mean Squared Errors (MSE) of three estimators, Am 
(CAMWE, solid), (Normal, dash-dotted) and Ak (Kernel, dashed) at different 
values of z with moderate sample sizes n = m = 40. Bottom Row: The simulation 
results in the same scenarios with larger sample sizes n = m = 100. 
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Figure 2: Simulation results of 95% pointwise bootstrap confidence bands (top row) 
and variance comparisons (bottom row) for three models with normal (left), Student-t 
with 3 degree of freedom (middle) and log-normal (right) noise with the same settings 
as in Figure [Hand sample sizes n = m = 40. Top row: True AUG and 95% pointwise 
Monte Carlo bands (solid) obtained from 500 runs, and the Monte Carlo averages 
of 95% pointwise bootstrap bands (dashed). Bottom row: Monte Carlo variance 
estimates (solid) obtained from 500 runs, and the Monte Carlo averages of bootstrap 
variance estimates (dashed), as described in Section 4.1. 
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Figure 3: Spanish Onion Data with response on the orginal scale (top) and the 
logarithmic scale (bottom), with the smooth estimates of the mean functions for two 
populations, Pumong Landing (solid) and Virginia (dashed). 
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Figure 4: Top panel: Comparison of estimated functional relationship between AUG 
and density obtained using the nonparametric approach with and without normal 
noise assumption, denoted by Normal and CAMWE respectively, with the parametric 
estimate following Faraggi (2003). Also shown are the 95% pointwise confidence bands 
obtained from nonparametric Bootstrap method. Bottom panel: Same comparison 
as in the top panel with response on the logarithmic scale. 
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